Weighted Network of Chinese Nature Science Basic Research 
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Using the requisition papers of Chinese Nature Science Basic Research in management and infor- 
mation department, we construct the weighted network of research areas(WRAN) represented by 
the subject codes. In WRAN, two research areas are considered connected if they have been filled in 
at least one requisition paper. The edge weight is defined as the number of requisition papers which 
have filled in the same pairs of codes. The node strength is defined as the number of requisition 
papers which have filled in this code, including the papers which have filled in it only. Here we 
study a variety of nonlocal statistics for these networks, such as typical distances between research 
areas through the network, and measures of centrality such as betweenness. These statistics char- 
acteristics can illuminate the global development trend of Chinese scientific study, it is also helpful 
to adjust the code system to reflect the real status more accurately. Finally, we present a plausible 
model for the formation and structure of networks with the observed properties. 

PACS numbers; 89.75.Hc; 89.75.Da 



I. INTRODUCTION 

Recently, the topological properties and evolution- 
ary processes of complex networks are used to de- 
scribe the relationships and collective behaviors in many 
fields 0, 0, H II H S 0- Some new analysis meth- 
ods and topology properties have been proposed by net- 
work analysis. Also it impelled us to study the com- 
plex system from the point of macroscopically view. A 
network is consisted of a set of nodes and edges which 
represent the relationship between any two nodes. The 
topological network is denoted by an adjacent matrix 
W — Wij, if node i connect to node j, Wij — 1; Oth- 
erwise, Wij — 0. Just because of its simplicity of this 
description, network can be used in so many different 
subjects, such as collaboration of scientistspLM Hol fTH- 
Internet networks [T^. World-Wide Web0], the collabo- 
rative research and proj ect bipartite network]^ and so 
on. Barber et. al da] studied the collaboration net- 
work consisting of research projects funded by the Eu- 
ropean Union and the organizations. They found that 
the collaboration network has the main characteristics, 
such as scale-free degree distribution, small average dis- 
tance, high clustering and assortative node correlations. 
However, the real systems are far from Boolean struc- 
ture. The purely topological characterization will miss 
important attributes often encountered in real systems. 
So to fully characterize the interactions in real-world net- 
works, weight of links should be taken into account. In 
fact, there are already manyworks on weighted networks, 
including empirical studies [l3. Iisl IT^ ITtL [iM Il9j | and evo- 
lutionary modelsllllMllllllllllllllJ^ 

The empirical study of weighted network without a 
naturally given definition of weight is especially valuable 
to answer questions such as how to define a well behav- 
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ior weight, and to extract structural information from 
networks, and what's the role of weight according to its 
effects on the structure of the network. We introduce 
some metrics that combine in a natural way both the 
topology of the connections and the weight assigned to 
them. These quantities provide a general characteriza- 
tion of the heterogenous statistical properties of weights 
and identify alternative definitions of centrality, local co- 
hesiveness, and affinity. By appropriate measurements 
it is also possible to exploit the correlation between the 
weights and the topological structure of the network, un- 
veiling the complex architecture shown by real weighted 
networks. 

The scientific studies can be considered as being orga- 
nized within a network structure, which has a significant 
influence on the observed study collective behaviors. The 
viewpoints of complex networks are of interest in study- 
ing scientific study networks to uncover the structural 
characteristics of WRAN. The topological statistics prop- 
erties have discussed in Ref . [2^ . In the fund management 
department, such as National Natural Science Founda- 
tion of China (NSFC), the research areas are denoted by 
the code system, which have the tree structure to demon- 
strate the inclusion relation between the research areas, 
such as Physics->statistical physics~>complex network. 
The leave codes of the code system always represent the 
research areas more specially. To make the network re- 
flect the reality more accurately, the nodes are defined 
as the codes. Because the scientists can fill in the fund 
proposal two codes: the first application code and the 
second one, then if one requisition paper filled in two dif- 
ferent codes one can consider that the research work is 
cross the two research areas. The edge weight Wij be- 
tween node i and j is defined as the number of papers 
filled in the two codes. The node strength Si is defined as 
the number of requisition papers which have filled code 
i, including the papers which have filled it only. By this 
definition, the network size is 321 in WRAN from 1999 
to 2004. The network shows all the main characteristics 
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known from other complex network structure, such as 
exponential distribution of degree, node weight and node 
strength, small average path length, large clustering, and 
assortative node correlations. Besides the general inter- 
est in studying the new network, the study could help 
us to know how the network structure affects network 
functions such as knowledge creation, knowledge diffu- 
sion and the collaboration of scientists. Moreover, the 
macroscopically analysis can illuminate the global devel- 
opment trend of Chinese scientific study, it is also helpful 
to adjust the code system to reflect the real status more 
accurately. 

II. MEASUREMENT OF WEIGHT AND BASIC 
STATISTICAL RESULTS 

Now we turn to the effects of weight on the structure 
of weighted networks. First, the interaction weight Wij 
is define as the number of requisition papers which have 
filled in code i and code j. The strength Si of node i is 
defined as 

Si = ^ mj + Vi, (1) 

where is the neighbor node set of node i and the fitness 
r]i is the number of requisition papers which filled in the 
code i only. The weight Wi of node i is defined as 

= (2) 
I 



This quantity measures the strength of nodes in terms of 
the total weight of their connections. The distributions 
of degree, node weight and node strength are demon- 
strated in Fig^ The probability distribution P(s) that 
a node has strength s is exponential distribution, and the 
functional behavior exhibits similarities with the degree 
distribution P(fc) (see FigCJ. The largest strength nodes 
have been listed in Table 1. 

A precise functional description of the exponential dis- 
tributions may be very important for understanding the 
network evolution and will be deferred to future analysis. 
To shed more light on the relationship between the node 
strength and degree, we investigate the dependence of Si 
on ki. We find that the average strength s{k) and weight 
w{k) of nodes with degree k increase with the degree as 

s{k) k'^^" , w{k) k/^^' . (3) 

The real data follows the power-law behavior with ex- 
ponent f3sk = 1-14 ± 0.02 and /3„fc = 1.12 ± 0.01(see 
Figl^J. The two exponents denote anomalous correla- 
tions between the number of paper which has filled in one 
node and the number of its connections, and imply that 
the strength and weight of nodes grows faster than their 
degree and the weight of edges belonging to highly con- 
nected nodes tends to have a higher value. This tendency 
denotes a strong correlation between the strength, node 
weight and the topological properties in WRAN. The dif- 
ference between Psk and /3wk implies that the larger de- 
gree a node is, the more fitness r]i it has. 



T able 1, The hub nodes of WRAN and their strength from 1999 to 200 4. 
Year Hub nodes s 

1999 Corporation theory 178 

2000 Macroscopical economy management and stratagem 79 

2001 Corporation stratagem management 93 

2002 Computer network, distributed computer system(CNDCS) 83 

2003 CNDCS 132 

2004 CNDCS 194 



A. Distance and Centrality 

Shortest path play an important role in the transport 
and communication within a network, it have also played 
an important role in the characterization of the internal 
structure of a network |27ll28l| . The average distance, de- 
noted hy D — dij, represent all the average 
shortest path lengths of a network in which the entry 
dij is the shortest path length from node i to node j. 
It should be noticed that all the network nodes are not 
all connected in the six years. The largest connected 
group has 256, 279, 293, 290, 309 and 310 nodes, respec- 
tively. The average distance is discussed on the largest 



connected group. The ability of two nodes, i and j, to 
communicate with each other depends on the length of 
the shortest path dij between them. The average dis- 
tance from node i to all other nodes is defined as 

1 ^ 

In the Boolean structure network, if nodes i and j are 
connected, dij = 1. In WRAN, the larger edge weight Wij 
is, the closer relationship between the two nodes have. 
Thus, the weighted distance dij is taken dij = 1/wij. 
The weighted shortest path length dij of WRAN is de- 
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FIG. 1; (Color online) Characteristics of WRAN, such as the 
distributions of degree, node weight and node strength. 
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FIG. 2: (Color online) Average strength s{k) as function of 
the degree k of nodes from 1999 to 2004. The inset figure 
shows the relationship between the average node weight w(k) 
and the degree k. 



FIG. 3: (Color online) The topological Di distributions from 
1999 to 2004 obey Passion distribution. The inset figure shows 
the average distance D from 1999 to 2004. 
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fined as the smallest sum of the distance throughout all 
the possible paths in the network from node i to j. Fig- 
ure Ol S demonstrate the topological and weighted Di 
distributions from 1999 to 2004 respectively, which both 
obey Passion distribution. From the two figures, we can 
obtain that most nodes' average distance Di are around 
3.5 and 2.2 in topological and weighted network, respec- 
tively. The nodes belonging to the left part of Passion 
distribution are very important to the network, because 
their average distance to all other nodes is very small. 
The two inset figures show that the average distance D 
of topological and weighted network decreases with time. 
This may caused by the increase of the average degree 
(fc)(See Fig. Since the number of requisition papers 
E can be obtained from the equation E = N{k) approxi- 
mately, the real reason why the average distance decrease 
may lie in the increasing number of requisition papers. 



FIG. 4: (Color online) The weighted Di distributions of 
WRAN from 1999 to 2004 obey Passion distribution. The 
inset figure shows the average distance D of weighted RAN 
from 1999 to 2004. 



B. Average Clustering coefficient 

The local clustering coefficient of node i, denoted by 
Ci, is a measure of the connectedness between the neigh- 
bors of the node, which is called transitivity in the social 
network [lll2^. If a node i has a link to node j and node j 
has a link to node fc, then a measure of transitivity in the 
network is the probability that node i has a link to node 
k. Let hi denote the degree of node i, and let Ei denote 
the number of link between the ki neighbors. Then, for 
an undirected network, the quantityP] 
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FIG. 5: (Color online) Average degree {k) from 1999 to 2004, 
which is increase almost 2 times from 1999 to 2004. 



FIG. 7: (Color online)The weighted clustering coefBcient vs 
time from 1999 to 2004. 




FIG. 6: (Color online)The topological clustering coefficient vs 
time from 1999 to 2004. 



is the ratio of the number of hnks between a node's neigh- 
bors to the number of links that can exist. The clustering 
coefficient C is defined as C = l/NJ2fLi Ci- In WRAN, 
the clustering coefficient indicates the probabiUty that 
a node connects to its 2nd nearest neighbors. Figure El 
presents the statistic result of C{k) ~ k. From Fig|Sl we 
can obtain that there are no correlation between C{k) 
and k before 2003, but the correlation emerged since 
2003, which is a characteristic of hierarchical network. 
The reason may lie in the fact that the code system has 
been adjusted around 2002. This result indicates that the 
rectification make the relationship of the subject codes 
becoming more clear. 

The weighted clustering coefficient is defined as 
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(6) 



This coefficient is a measure of the local cohesiveness that 
takes into account the importance of the clustered struc- 
ture on the basis of the amount of traffic or interaction 
intensity actually found on the local triplets. Indeed, 
counts for each triplet formed in the neighborhood 
of the node i the weight of the two participating edges 
of the node i. In this way we are considering not just 
the number of closed triplets in the neighborhood of a 
node but also their total relative weight with respect to 
the strength of the node. Consistently, the c™ defini- 
tion recovers the topological clustering coefficient in the 
case that Wij is constant and ry^ = 0. Next we define 
and C'^{k) as the weighted clustering coefficient averaged 
over all nodes of the network and over all nodes with de- 
gree fc, respectively. These quantities provide global in- 
formation on the correlation between weights and topol- 
ogy, especially by comparing them with their topolog- 
ical analogs. Figure |7| presents the power-law correla- 
tions C"'(fc) ^ k" between C^{k) and degree fc, where 
a = —2.15 ± 0.06, which may be caused by the intro- 
duction of node fitness 77^. Because the larger the degree 
k is the larger rji would have, the denominator of Equ. 
© would become more larger, then C^{k) would be- 
come small. If replace Si of Equ.® with fc^, we get the 
definition of weighted clustering coefficient presented in 
Ref . [r^ . Figure |S1 presents the relationship between 
and C of WRAN. The fact C"" < C signals a network 
in which the topological clustering is generated by edges 
with low weight or by nodes with larger fitness. In this 
case the clustering has a minor effect in the organization 
of the network because the largest part of the interac- 
tions is occurring on edges not belonging to intercon- 
nected triplets. The figure also indicates that C increase 
with time, while C™ keep constant. Interestingly, C in- 
crease dramatically about 10 percent from 2002 to 2003. 
This change is consistent with the correlation C(k) ^ k. 

Along with the weighted clustering coefficient, 
we introduce the weighted average nearest-neighbors 
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FIG. 8: (Color online) Topological and weighted clustering 
coefficient of WRAN from 1999 to 2004. 



FIG. 9: (Color online)Topological and weighted average 
nearest-neighbors degree of WRAN of 1999 and 2004. 



degree [13, defined as 
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(7) 



In this case, we perform a local weighted average of 
the nearest-neighbor degree according to the normalized 
weight of the connecting edges, Wij/si. This definition 
implies that if the edges with the larger weight are point- 
ing to the neighbors with larger degree, j > knn,u 
In the opposite case fc™„ ^ < knn.i- Thus, A:Jf„ j mea- 
sures the effective affinity to connect with high- or low- 
degree neighbors according to the magnitude of the ac- 
tual interactions. Moreover, fc™„(fc) marks the weighted 
assortative or disassortative properties considering the 
actual interactions among the systems elements. Figure 
El presents the topological and weighted average nearest- 
neighbors degree of 1999 and 2004, which demonstrate 



that fc!! 



> kr, 



and both of them have the trend of 



increasing with the degree k. 

The positive assortative coefficient r, which is pre- 
sented by Ref. [H [sO], of WRAN has presented in 
Fig llOl which means that the nodes with higher de- 
gree would like to connect each other. Figure El told us 
that the nodes, whose degree is large, must have larger 
strength. Then, the nodes with more strength would like 
to connect each other. 



C. Betweenness 

The communication of two non-adjacent nodes, called 
j and fc, depends on the nodes belonging to the paths 
connecting j and k. Consequently, the definition node 
betweenness is present to measure the relevance of a given 
node by counting the number of geodesies going through 
it. The betweenness is one of the standard measures of 
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FIG. 10; (Color online) Assortative coefficient vs time of 
WRAN. 
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FIG. 11: (Color online)Zipf plots of node betweenness for 
topological WRAN from 1999 to 2004. 
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FIG. 12: (Color online)Zipl plots of edge betweenness for 
topological WRAN from 1999 to 2004. 



node centrality. The betweenness bi of node i, is defined 
aslHllillllil 



N 



(8) 



where Ujk is the number of shortest paths connecting 
j and k, while rijk{i) is the number of shortest paths 
connecting j and k and passing through i. This quantity 
is an indicator of which node is the most influential one in 
the network is. The nodes with highest betweenness also 
result in the largest increase in typical distance between 
others when they are removed. The nodes with largest 
betweenness have listed in Table 2. These nodes are the 
most important one for information transitivity. 

The edge betweenness is defined as the number of short- 
est paths between pairs of nodes that run through that 
edge Hi. 



nodes, each with an initial attractiveness sq. In this pa- 
per, So is set to be 1. At each time step, every node 
strength of the network would increase by 1 with the 
probability p; With the probability {1—p), each existing 
node i selects m other existing nodes for potential inter- 
action according to the probability Equ. Here, the 
parameter m is the number of candidate nodes for cre- 
ating or strengthening connections, p is the probability 
that a node would enhance rji by 1. 



n, 



(9) 



where Si — J2j^r(i) +Vi- ^ pa-ir of unlinked nodes is 
mutually selected, then an new connection will be built 
between them. If two connected nodes select each other, 
then their existing connection will be strengthened, i.e., 
their edge weight will be increased by 1. We will see 
that the model can generate the observed properties of 
WRAN. When p = 0.01 and m = 5, the numerical results 
to different time step T are demonstrated in Fig. \l'6i?7. 
Figure [TSI (a)-(c) give the exponential distributions of 
degree, node strength and edge weight. Figure [TSl (d) 
demonstrate the power-law relationship between degree 
k and node strength s. Figure im demonstrates the in- 
creasing trend of C, decreasing trend of D and r and the 
C'^{k) ^ k'^ relationship. From the inset of Fig^] (b), 
one can see that when the time step T is very small, there 
is no correlation between C{k) and k, while when T is be- 
come large, the correlation emerge, which consistent with 
C{k) ^ k of WRAN. Figure PPil (b) gives the power-law 
relationship C"'(fc) ^ k", where a = 1.11 ± 0.05, which 
also consistent with the one of WRAN. The inset of Fig- 
ure El (d) gives the Zipf plots of node betweenness to 
different time step T. All of the above structural char- 
acters of MSM are consistent with the ones of WRAN 
approximately, which indicate that the mutual selection 
mechanism and the probability p may be the evolving 
mechanism of WRAN. 



Table 2 The node wich has largest betweenness from 

1999 to 2004. 

The node with largest betweenness 

1999 Computer-aided design 

2000 Intelligent information processing 

2001 Management information system 

2002 Management information system 

2003 Artificial intellegence(AI) 

2004 Intelligent information processing(IIP) 



III. A MUTUAL SELECTION MODEL 

In this section, we present a mutual selection model 
(MSM) to compare with WRAN. Inspired by the fitness 
rji and the mutual selection mechanism, the model is de- 
fined as following. The model starts from N isolated 



IV. CONCLUSIONS AND DISCUSSIONS 

We have studied the Chinese Nature Science Basic Re- 
search in management and information department from 
weighted network point of view. To describe the status 
of WRAN more accurately, the requisition papers which 
have filled in only one subject code is also considered, 
which is defined as node fitness. We have looked at a 
variety of nonlocal properties of our networks. 

Using this measure we have added weighting to WRAN 
and used the resulting networks to find which code have 
the largest strength, the shortest average distance to oth- 
ers. Generalization of the clustering coefficient and be- 
tweenness calculations to these weighted networks is also 
straightforward. The statistic characterization give the 
following conclusions 

(1). The code system have adjusted around 2002 and the 
correlation between C{k) ~ k emerges since 2003. 
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(2). The topological and weighted distance decrease with 
time, while the clustering cocfBcient increases with 
time. 



(3). The distributions of degree, edge weight and node 
strength have exponential form. 



(4). The larger the node degree is, the larger fitness it 
would be. 



(5). WRAN is assortative, which means that the node 
with large strength would like to connect each 
other. 



FIG. 13: (Color online) Simulated distributions of degree, 
node strength and edge weight to different time step T. (d) 
give the relationship between k and s. 
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In terms of structural characteristics of WRAN, the 
present analysis yields a plausible model. Based on the 
mutual selection mechanism and the probability p that 
one node would increase its strength without creating 
new connectivity with others, we presented MSM model. 
Most of the structural characters of MSN are consistent 
with the ones of WRAN. 

The calculations presented in this paper inevitably rep- 
resent only a small part of the investigations that could 
be conducted using large network data sets such as these. 
We hope, given the high current level of interest in net- 
work phenomena, that others will find many further uses 
for these data. 
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